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Abstract 

Mnmpriral Aerodynamic Simulation (NAS) Facility has a long standing practice 

g§ 

vendors feel support is justified. This note provides an informalhistoryof these 
from source code, onsite. 
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Abstract 


d2 ptai l il i" instildteXare V There as a £ nS standin 9 Practice of 
designated pathfinding role, and the need tn maintain ^ e * r u. are ^ wo reasor) s for this - NAS' 
Je widely diversified nature of th«^w ^ .i ^ !^^V nnin 9 operational capacity 
support capabilities when vendors are not able - diannn«?<= a nH^ ^. as . a need *° ma,r| tein 
problems where applicable; and to sunnort nnnnS°f and £® med y hardware or software 
whether or not the relevant’vendors fee^ support SS deve,0 Pment activities 

history of these activities at NAS, and brings toaether th? n' 0 I h,S . note P rovides an informal 
requirement that systems integrated into the NAS pnvimnmt ? ra p r ,nc, P |es tha * drive the 
code, onsite. y e env 'ronment run binaries built from source 

This is a revision of Russell carter’s June 1994 report: r ND - 9 4-007. 

'I Q Introduction 
2 u Pra ctical Considerat ions 

d.O Exper iences at NAS 

"j-j documentation 

4 0 Vendor Data Nig hts 

5.0 Nummary — 

^^Acknowledgments 


1.0 Introduction 

featurirfa 0 ve P t ro ^ ram a * Ames Research 

high-performance LANs NAS supports nv^r fiftoin’ and are lnterc °nnected via one or more 
Each subsystem is charaSeriz^^ a "d 450 sysfems ovS 

coexist peacefully in an open environment. Th2 NASfe ^in^ nd , 0r ?° lu,ion ?. which must 
functioning of this very complex environment ponsible for the continued smooth 

integration a Achi*eWng^his^^afrec^{res^U?e^i'bili!^to r ^sMi*ti? P h^ e |J 0r,Tlance s y stem 

systems, regardless of vendor. Thus Ihe ^ rt ^ ln S 

diversity, and accompanying ehallengesjnterem irftl^NAS operational 

It is critical for the tools and techniques that are used to maintain the effectiveness of the NAS 
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Sn Convex BSDI StoraqeTek, Ultra, IBM, and Intel Supercomputing Systems Division) 
s^rce^oderequirement, particularly from vendors unfamiliar with or new to the NAS 

have for system source code: 

. Documentation 

! BugTdentificahon! patching and resolution of vendor design flaws 

ThPn follows a discussion of vendor data rights, and some of our procedures that we use to 
nrntprtthem fK there is a summary. It is important to note that no proprietary code is 
^"rred^ to XnSsed pMorms^at NAS-even for projects that require porting. 

2.0 Practical Considerations 

multfv^idor^i^'l^e^AS^TIw product may wod^weH^n^er Mnre'ci^ to 

and configuration issues, it may not work as intended for NAS. 

Reasons for the lack of support are varied but are usually one or more of the following: 


. Hard problems! 

(Complex systems have complex problems!) 
. Management issues: 

Magnitude of support task underestimated 
Poor internal communications 
Support staff not competent enough 
Insufficient engineering resources 
• Firm having financial difficulties 
. Conflicts with marketing/sales goals 


to recognize an issue as a bug due to sales/marketing ramifications. 


3.0 Experiences at NAS 

the list maintained for these examples has many more entries. 


3.1 Documentation 

The provided documentation on the types of systems at NAS is often “'i 1 , ° f d ^*f 0 wrth 

thp installed software or even just wrong. Some documentation may be out ot date au 
S ea rty "s or beta 'software. Others are out of date even though 
official release. Fundamental tasks, such as adding a device driver, say, for a HiPPI-attached 
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documentation^^haUwmel^alls (fo^instancefare availabie^i^th^source co^ 

’ SSHSIS 

source code. Exactly how he scheduled fil,ed in by using 

discovered this way scneduler determines an "interactive" process was 

' i^fomafo^fromlhe syste^temd rancemina Sota^ needS '° be able t0 extract 
utilization, and about per-sesstoTre»ur^XaUon E|2i ,rce a V ailabilit y and 
miniserver module has similar nporte tn ron*r i • ' P r S s m achine-oriented 

documentation of the neS toiSiSSS 01 sess,0 , n l,mits - Generally, 
to code is required to find out how the nSrfacerwor^?^ enougbtbat refer ence 

correct sou rce 6 Proble ms with s u pno rt P d isthh S ? U rCS guarantees that you have the 
from source onsite PP ° rt d,Stribut,ons were uncovered by building 

* traffic e fn theSl^l o? 1 ^otirc^code thfstuo ha^b COnfror !! ed with multi-cast 
as of 8/96. code th,s bug has been - and remains, unresolved 

versio^doe's' not sTpportroutef diTcovIr? ^ndf^ ^th'^ ? " gated ” The SGI 
multiple interfaces are present on the marhinp S the default route when 
m IRIX 6.2 requires access to ° f 

' packe?requests^Th^vendor^^me<^th^netw be k Vend w rt ° '' 9nore du P lfcate 

'S25! 

normal execution of NQS. rGS W6re n0t pro P er| y updated during the 

• A CRI exception-handling failure resulted in a "Kill ric'u'ii i « ■ . , , 

process 1. This resulted ?n a be,n9 sent to 

■ sss&resr" These avera9es 

' toa^nXrtised^ Mb ^ es The fault was traced 

nodes to ttie cor^iute'^S^'s. ^xamS^OT^the^ouriS* ^e discloSd^the^^ 
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and allowed its on-site resolution. 

. The delivered IBM SP-2 NFS implementation buffers NFS writes (a protocol 
violation). When a parallel job is killed because it exceeds !ts time allocation the 
process is blocked inside the kernel until the buffers are flushed. Given the ar T19^ n 
of data that may be buffered, and the small I/O bandwidth to the NFS server, this 
could take up to an hour. During this time, the unkillable process owns the switch, 
so any subsequent jobs started on this node will fail. Because this feature is 
tightly integrated with the IBM virtual memory design development efforts have 
been directed to making PIOFS functional instead of attempting to fix NFS. 

. The Intel Paragon virtual memory system had numerous design and 

implementation flaws. Examination of the source provided aven u e s1^or testing, 
verification of problems, and workarounds. System fixes were provided to Intel for 
incorporation into the release source. 

. Systems can not be verified as secure in the absence of source. Delivered binaries 
are always suspect. Known or discovered security weaknesses require source for 
resolution. 

. The addition of batch scheduling software is vastly improved with system source. 
Scheduling software requires an understanding of how to interact with the native 
system scheduler. The source code is the best documentation for this activity. 

. The ULTRAnet provided FTP client crashed if the user's shell name was a multiple 
of 4 characters. Without source code, this bug could not have been corrected. 

3.2 Bug identification, patching, and resolution of vendor design flaws 

There are numerous examples of site-critical bugs fixed using source code before the vendor 
could (or would) respond to a bug report. 

. Incorrect padding of tape blocks in a tape driver ruined backups. Botched system 
scheduling parameters required fixing. 

. The Proteon Ring evaluation required driver debugging using UNICOS source 
code. 

• Early SGI releases included a bug that prevented telnet connections to the Cray-2. 
Source code was required to fix the bug. 

. Lack of Informix source prevented an on-site solution to resolution of commit 
function bugs. In the absence of vendor cooperation, a workaround replacement 

was instituted. 

. The Morris Virus sendmail bug was fixed using source code for all systems except 
SUN, within two days. Without source code for SUN sendmail, the fix for SUN 
systems took six months. 

. Resolution of vendor-vendor incompatibilities is facilitated through the use of 
diagnostic traps implemented in the operating systems, as was the case with Cray 

and NSC. 

. iPSC/860 remote host software was ported to SGI s y s t e ™f. Without this added 
functionality, the iPSC/860 could not have supported the NAS workload. NAS 
personnel successfully carried out the port. 
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Routing rnformation^rotocol package ^o^karou^ds^ 't h order to su PP ort the 
implemented in this manner as well. 9 kar °unds for other problems were 


personnel dfagnosed an Amdahl-generated^ system, NAS 

code allowed the implementation nf fiio ^/ot 8 a error * Access to source 

bothered to 1*? Amdahl had never 

the file system and rescue 


feast 12 serious (^ONVEX kem^ 'bugs'thanks'to atfre aV t d ? ::urnented and fixed at 
bug fixes were later incorporated into the Convex ^Th° the source code - Thes ® 
system code; networking code- and sDecifir riZSPIa They were in the areas of file 
HiPPI driver, and the TLI (Block Ml ix P r-h^fn J ce dn / er! : llke the SCSI driver, the 
mean, perhaps weeks of SingfoV a MmConvef ° f S ° UrCe WOU,d haVe 


te^SlVsun^ei^er^ftwai^fACSLfff^ps^^^hP fix at least tw0 Problems with 
to STK and later incorpSd'wo^ ^ACS lI gEhS" and " XeS Were sabmi « ed 


ThTy fedude:° difiCati0nS haVe been made ,0 ,he CRI-supplied UNICOS system. 


1 ■ retease^disti^ubonstej^CW SeS ™ S fuTOtionali *V * now incorporated in 
2 ' dttnLlonffrom CRI ident ' Func,ionali ‘y "ow incorporated in release 

l Security e modifica?ions. W ^ s i >ec ’^ cat i° n °f*day* 

«■ S u f^- ‘hentifiers for accounting. 

6. Disk Highwater mark 

characteristic^ C ° mmand WhiCh adjus,s block size Spending on device 


^ystemVDMsi^ V de^top^the^Sess!on^eseivab? a F'f X ^ CU ( 0n ’ * be 

SRFS system file space fs guaranteed to a fob durinn 5* stem < S * FS >- B y usln 3 
facilitates the allocation of le SDace on hS" 9 tser ^® execution. SRFS 

state disk device. This major enhancement tnTiN?Y mC udmg the CRI solid- 
access to system source ennancement to UNIX was only possible because of 


S5SS s: sss® 


"password" an cT’^ c.mo un ^ securit y policies require modification to "fog in,’ 


3.3 Software Development 


f“f^ .0 add new features and 

NAS acquires many technologies 


O/ri /nnno n 
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a technology work in production implies that we perform development support on the 
technology. Without source code, the vast majority of this work would not be possible, and 
NAS would be subjected to the sometimes-unsuitable agendas of vendors for support and 
further development. The following examples illustrate the varieties of NAS development 
needs that are met using source code. 

• The Network Queuing System (NQS) was developed using source code for 
targeted systems. 

. The Portable Batch System's (PBS) resource monitor needs to be able to extract 
information from the system kernel concerning global-resource availability and 
utilization, and about per-session resource utilization. PBS’s machine-oriented 
miniserver module has similar needs to control session limits. Generally, 
documentation of the necessary kernel interfaces is sketchy enough that reference 
to code is required to find out how they work. 

. Amdahl UTS source code enabled the implementation of NAStore and port TCP/IP 
when it wasn't available from Amdahl. 

. ConvexOS source code allowed us to port NAStore and enhance the OS. Joint 
development with Convex resulted in an improvement in file system performance 
from 20 MB/s to 150 MB/s. 

. Wellfleet router source code allowed the implementation of router discovery. This is 
now a distributed product by Wellfleet. 

. Source for UNICOS and Cray NQS allowed the implementation of the Session 
Reservable File System (SRFS), an enhanced resource management function. 

Disk quotas and an Ultra driver (mandatory in the NAS environment) were also 
implemented. 

. UNICOS source code was need to develop tools to modify the kernel mount table 
and develop the top and mu (memory usage) commands. Minor hooks were added 
for new user-level services such as real-time and cpu-time gid limits. 

• Non-intrusive, low-level data collection requires modification to the operating 
system. Two successful projects that required access to system source code are 
the iPSC/860 Concurrent File System monitoring and a message sizes monitoring 
project. 

. The Map library on the iPSC/860 (which provides support for multidisciplinary 
applications) is a modified version of an Intel message-passing library, successfully 
modified with no performance degradation. It allows application programmers 
access enhanced message-passing functionality. 

. For the p2d2 project and MPK project, source is required on targeted platforms for 
the libraries that support parallel processing (For example: message passing 
collective operations, process creation, termination, locks, events, and critical 
sections). 

. Access to the Amdahl UTS source code allowed NAS to embark on the native file 
system HSM (Flierarchical Storage Manager), it also allowed us to create one of 
the very first RAID. 

. Access to the Convex source code allowed NAS to port the native file system HSM 
to the Convex platform. 
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' sTsfemt o°bfmS n m W ore FkeThe oto a C |°H mpl ? el ''^ esi3ned lhe Convex RAID file 
performance improvemen for ^eloped for the Amdahl. The 

same hardware for reads, and from Ss/sec to too MBfeVcWites^ 60 °" ,he 

4.0 Vendor Data Rights 

^^bl^h^protediorfscheme^tha^meeto tHeir^e^re^enfr^ 0 ^ 8 w ,? h software vendors 10 

augmented with vendor-^S^3S5Sl5 s base,,ne protec,lon P*y which is then V 

active source-code users Deriodirallv miiow oc Pr^ • access to it. NAS maintains lists of 
to listed users. Before being added to the lists users' S rTh^ C n ange ’ and al, °ws access only 
vendor pre-requisite, such as siqninq a non dkHntnrf ypica,ly reguir ed to meet some V 

cv" Se,VantS and « legally bcun^sTpp^^ 

imptementrrtion offend tor-spt^rJ^Fr^^fh^l^ C t ° de laboratory to facilitate 
computers, removable disk drives and kev-tpS'irla J i? f°^u ,s egui PP ed with build 
readily physically disconnected from the NA 9 ^bmets. The build machines can be 
restricted to listed users of the sSe code ^cuS^stS'^- ' n ? ny case - ,ogin a ccess is 
access is permitted, filtering programs prohiS conn^Snc f d eac £ system - lf network 
addresses (typically workstations belonS to users ?™™S7n any bUt an a PProved list of IP 
network connections are encrypted to p?eclude^ ^SnKi. f f ° r aC f e i s) ' ,f ne cessary, 

capability to enforce almost any aix^ss me°hodotogy U neg(^ated a ^tfa 0 ^nH^ pr ^ tdam,tted 

oocu™ 0,-y * 

This is equivalent to using the US. GovernnX DES algorithm “btter 3 ' 16331 3 56 ' bl '' key ' 

5.0 Summary 

documented' cSmptotely^ flawlessly, kernel interfaces are 

v f adors fe strongly motivated to meet NAS mfsston rririoTi ayna w ! th binar Y distributions, and 
Of these hold in practice. Hence the requirement that naJI?! ,n a t, IT ,ely man ner. None 
frorn source code. NAS access to buildable source mde h fl rH SyS en ! S( iI^ vare built onsite 
vendor, and historically has been a basis fnr ronhnnia has dem °nstrable benefits for the 
supplied systems, wh& has ° f,h6 

6.0 Acknowledgments 

: 1 ®^PI^>rt^T®^»riSaSora 0 VV^outT^®*h2o , o?fH NAS Sys,ems Division 

worked on the complex svstpm<; nrnhiamc u- ± .4, help of the many persons who havp 
would not have beefdocSted P toSd *SS?3»OT nto ? d at «? e N V KfoSon 

previously been formally documented from th*. ° f tha information in this note has not 

Contributors of comments, examples and/ortext include' hVinn^rf l? ur f e code requirements. 
Lekashman, Dave Tweten, Bill Kramer Toby Hame^ include: Joh n 

Fineberg, Bernard Traversat, David McNab ^ric Town^nd ^ ^ H xi? n ’ 6,11 Nitzbe rg, Sam 
oreen Cheng, Jeff Becker, David Barkai and David Henkel’-Wallace 0mpson ’ Alfred Nothaft, 


2/4/2009 ft no a m 



The Need For Vendor Source Code at NAS 


http://www.nas. nasa.gov/Researc... ports/1 996/HTML/NAS 96-022.html 


^r°bSid"2 whin SSs US to^avoi d mentioning that 

we even have their source code. 
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